AITopics

arXiv.org Artificial IntelligenceSep-30-2024

T-KAER: Transparency-enhanced Knowledge-Augmented Entity Resolution Framework

Li, Lan, Fang, Liri, Liu, Yiren, Torvik, Vetle I., Ludaescher, Bertram

Entity resolution (ER) is the process of determining whether two representations refer to the same real-world entity and plays a crucial role in data curation and data cleaning. Recent studies have introduced the KAER framework, aiming to improve pre-trained language models by augmenting external knowledge. However, identifying and documenting the external knowledge that is being augmented and understanding its contribution to the model's predictions have received little to no attention in the research community. This paper addresses this gap by introducing T-KAER, the Transparency-enhanced Knowledge-Augmented Entity Resolution framework. To enhance transparency, three Transparency-related Questions (T-Qs) have been proposed: T-Q(1): What is the experimental process for matching results based on data inputs? T-Q(2): Which semantic information does KAER augment in the raw data inputs? T-Q(3): Which semantic information of the augmented data inputs influences the predictions? To address the T-Qs, T-KAER is designed to improve transparency by documenting the entity resolution processes in log files. In experiments, a citation dataset is used to demonstrate the transparency components of T-KAER. This demonstration showcases how T-KAER facilitates error analysis from both quantitative and qualitative perspectives, providing evidence on "what" semantic information is augmented and "why" the augmented knowledge influences predictions differently.

doduo, information, sherlock, (14 more...)

2410.00218

Country: North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

arXiv.org Artificial IntelligenceFeb-20-2024

Unlocking Insights: Semantic Search in Jupyter Notebooks

Li, Lan, Lv, Jinpeng

Semantic search, a process aimed at delivering highly relevant search results by comprehending the searcher's intent and the contextual meaning of terms within a searchable dataspace, plays a pivotal role in information retrieval. In this paper, we investigate the application of large language models to enhance semantic search capabilities, specifically tailored for the domain of Jupyter Notebooks. Our objective is to retrieve generated outputs, such as figures or tables, associated functions and methods, and other pertinent information. We demonstrate a semantic search framework that achieves a comprehensive semantic understanding of the entire notebook's contents, enabling it to effectively handle various types of user queries. Key components of this framework include: 1). A data preprocessor is designed to handle diverse types of cells within Jupyter Notebooks, encompassing both markdown and code cells. 2). An innovative methodology is devised to address token size limitations that arise with code-type cells. We implement a finer-grained approach to data input, transitioning from the cell level to the function level, effectively resolving these issues.

jupyter notebook, notebook, semantic search, (14 more...)

2402.13234

Country:

North America > United States > Illinois (0.05)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Frimpong, Eugene, Nguyen, Khoa, Budzys, Mindaugas, Khan, Tanveer, Michalas, Antonis

GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption

arXiv.org Artificial IntelligenceJan-26-2024

Machine Learning (ML) has emerged as one of data science's most transformative and influential domains. However, the widespread adoption of ML introduces privacy-related concerns owing to the increasing number of malicious attacks targeting ML models. To address these concerns, Privacy-Preserving Machine Learning (PPML) methods have been introduced to safeguard the privacy and security of ML models. One such approach is the use of Homomorphic Encryption (HE). However, the significant drawbacks and inefficiencies of traditional HE render it impractical for highly scalable scenarios. Fortunately, a modern cryptographic scheme, Hybrid Homomorphic Encryption (HHE), has recently emerged, combining the strengths of symmetric cryptography and HE to surmount these challenges. Our work seeks to introduce HHE to ML by designing a PPML scheme tailored for end devices. We leverage HHE as the fundamental building block to enable secure learning of classification outcomes over encrypted data, all while preserving the privacy of the input data and ML model. We demonstrate the real-world applicability of our construction by developing and evaluating an HHE-based PPML application for classifying heart disease based on sensitive ECG data. Notably, our evaluations revealed a slight reduction in accuracy compared to inference on plaintext data. Additionally, both the analyst and end devices experience minimal communication and computation costs, underscoring the practical viability of our approach. The successful integration of HHE into PPML provides a glimpse into a more secure and privacy-conscious future for machine learning on relatively constrained end devices.

ciphertext, hhe, protocol, (15 more...)

2401.1484

Country:

Europe > Finland > Pirkanmaa > Tampere (0.05)
North America > United States (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.61)

arXiv.org Artificial IntelligenceSep-5-2023

Coincident Learning for Unsupervised Anomaly Detection

Humble, Ryan, Zhang, Zhe, O'Shea, Finn, Darve, Eric, Ratner, Daniel

Anomaly detection is an important task for complex systems (e.g., industrial facilities, manufacturing, large-scale science experiments), where failures in a sub-system can lead to low yield, faulty products, or even damage to components. While complex systems often have a wealth of data, labeled anomalies are typically rare (or even nonexistent) and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called CoAD, which is specifically designed for multi-modal tasks and identifies anomalies based on \textit{coincident} behavior across two different slices of the feature space. We define an \textit{unsupervised} metric, $\hat{F}_\beta$, out of analogy to the supervised classification $F_\beta$ statistic. CoAD uses $\hat{F}_\beta$ to train an anomaly detection algorithm on \textit{unlabeled data}, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and a data set from a particle accelerator.

anomaly detection, representation, theorem 2, (15 more...)

2301.11368

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Law (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

arXiv.org Artificial IntelligenceJul-21-2023

Framework for developing quantitative agent based models based on qualitative expert knowledge: an organised crime use-case

Oetker, Frederike, Nespeca, Vittorio, Vis, Thijs, Duijn, Paul, Sloot, Peter, Quax, Rick

In order to model criminal networks for law enforcement purposes, a limited supply of data needs to be translated into validated agent-based models. What is missing in current criminological modelling is a systematic and transparent framework for modelers and domain experts that establishes a modelling procedure for computational criminal modelling that includes translating qualitative data into quantitative rules. For this, we propose FREIDA (Framework for Expert-Informed Data-driven Agent-based models). Throughout the paper, the criminal cocaine replacement model (CCRM) will be used as an example case to demonstrate the FREIDA methodology. For the CCRM, a criminal cocaine network in the Netherlands is being modelled where the kingpin node is being removed, the goal being for the remaining agents to reorganize after the disruption and return the network into a stable state. Qualitative data sources such as case files, literature and interviews are translated into empirical laws, and combined with the quantitative sources such as databases form the three dimensions (environment, agents, behaviour) of a networked ABM. Four case files are being modelled and scored both for training as well as for validation scores to transition to the computational model and application phase respectively. In the last phase, iterative sensitivity analysis, uncertainty quantification and scenario testing eventually lead to a robust model that can help law enforcement plan their intervention strategies. Results indicate the need for flexible parameters as well as additional case file simulations to be performed.

agent, artificial intelligence, case file, (17 more...)

2308.00505

Country:

Europe > Netherlands > South Holland > Rotterdam (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
South America > Colombia (0.04)
(4 more...)

Genre:

Personal > Interview (0.47)
Research Report > New Finding (0.46)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

#artificialintelligenceApr-1-2023, 14:40:45 GMT

Data Drift vs. Concept Drift: What Is the Difference? - DATAVERSITY

Model drift refers to the phenomenon that occurs when the performance of a machine learning model degrades with time. This happens for various reasons, including data distribution changes, changes in the goals or objectives of the model, or changes to the environment in which the model is operating. There are two main types of model drift that can occur: data drift and concept drift. Data drift refers to the changing distribution of the data to which the model is applied. Concept drift refers to a changing underlying goal or objective for the model.

concept drift, input data, model drift, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceJan-22-2023, 07:15:25 GMT

AdTheorent Is Using Machine Learning To Predict Effective Inventory

Signal loss calls for the use of, well, other signals. "The biggest trend for us right now is finding ways to be less reliant on cookie data," said John Kirk, media director in charge of digital investment at 22Squared, an Atlanta-based media agency whose clients include Baskin-Robbins, Publix and Southeast Toyota. One alternative approach, Kirk said, is to "home in on audiences where we do have the data." In that vein, 22Squared has been testing a solution released by AdTheorent on Wednesday that uses machine learning to score programmatic inventory based on the probability that an impression will lead to a desired outcome. Southeast Toyota is also a launch partner for the product.

adtheorent, artificial intelligence, machine learning, (12 more...)

Industry:

Marketing (0.40)
Information Technology > Services (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.91)

#artificialintelligenceJan-9-2023, 15:06:53 GMT

Top Learning Trends to Expect in 2023 - Beyond The Sky

What learning trends are in store for 2023? The coming year will bear witness to some significant changes across the entire business world. Recession poses an ever-constant threat so it's important that businesses equip themselves for the future. On top of it all, businesses are having trouble finding new recruits due to a labor shortage. So 2023 will force businesses to strengthen their current ranks to protect against a shallow recruitment pool.

artificial intelligence, current workforce, machine learning, (16 more...)

Country: North America (0.30)

Industry:

Health & Medicine (0.48)
Education (0.48)
Banking & Finance > Economy (0.48)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

#artificialintelligenceNov-11-2022, 07:20:38 GMT

AI system reconstructs words from brain data

Researchers demonstrate an AI system that can reconstruct semantic content in the form of text from fMRI data. A brain-computer interface that reconstructs language would have numerous applications in science, medicine, and industry. Invasive methods using recordings from surgically implanted electrodes show that it is possible to reconstruct language for simple brain control. But these interventions remain dangerous, even though companies like Elon Musk's Neuralink are working on methods to make such interventions as harmless as possible and without consequential damage. Non-invasive language decoders, however, could become commonplace and help people in the future to control technical devices by thought, for example.

ai system, ai system reconstruct word, decoder, (14 more...)

Country: North America > United States > Texas > Travis County > Austin (0.05)

Industry: Health & Medicine > Health Care Technology (0.61)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.33)